Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text
Identifieur interne : 000922 ( Main/Exploration ); précédent : 000921; suivant : 000923Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text
Auteurs : Tarek Elghazaly [Égypte] ; Aly Fahmy [Égypte]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2009.
Abstract
Abstract: This paper provides a novel model for English/Arabic Query Translation to search Arabic text, and then expands the Arabic query to handle Arabic OCR-Degraded Text. This includes detection and translation of word collocations, translating single words, transliterating names, and disambiguating translation and transliteration through different approaches. It also expands the query with the expected OCR-Errors that are generated from the Arabic OCR-Errors simulation model which proposed inside the paper. The query translation and expansion model has been supported by different libraries proposed in the paper like a Word Collocations Dictionary, Single Words Dictionaries, a Modern Arabic corpus, and other tools. The model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion; it gives high degree of accuracy in handling OCR errors.
Url:
DOI: 10.1007/978-3-642-00382-0_39
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000024
- to stream Istex, to step Curation: 000024
- to stream Istex, to step Checkpoint: 000444
- to stream Main, to step Merge: 000930
- to stream Main, to step Curation: 000922
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text</title>
<author><name sortKey="Elghazaly, Tarek" sort="Elghazaly, Tarek" uniqKey="Elghazaly T" first="Tarek" last="Elghazaly">Tarek Elghazaly</name>
</author>
<author><name sortKey="Fahmy, Aly" sort="Fahmy, Aly" uniqKey="Fahmy A" first="Aly" last="Fahmy">Aly Fahmy</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B57F71A720C3286257DF773DF9FD0D9AA1EB23F2</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-00382-0_39</idno>
<idno type="url">https://api.istex.fr/document/B57F71A720C3286257DF773DF9FD0D9AA1EB23F2/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000024</idno>
<idno type="wicri:Area/Istex/Curation">000024</idno>
<idno type="wicri:Area/Istex/Checkpoint">000444</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Elghazaly T:query:translation:and</idno>
<idno type="wicri:Area/Main/Merge">000930</idno>
<idno type="wicri:Area/Main/Curation">000922</idno>
<idno type="wicri:Area/Main/Exploration">000922</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text</title>
<author><name sortKey="Elghazaly, Tarek" sort="Elghazaly, Tarek" uniqKey="Elghazaly T" first="Tarek" last="Elghazaly">Tarek Elghazaly</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Faculty of Computers and Information, Cairo University, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Égypte</country>
</affiliation>
</author>
<author><name sortKey="Fahmy, Aly" sort="Fahmy, Aly" uniqKey="Fahmy A" first="Aly" last="Fahmy">Aly Fahmy</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Faculty of Computers and Information, Cairo University, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Égypte</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">B57F71A720C3286257DF773DF9FD0D9AA1EB23F2</idno>
<idno type="DOI">10.1007/978-3-642-00382-0_39</idno>
<idno type="ChapterID">39</idno>
<idno type="ChapterID">Chap39</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper provides a novel model for English/Arabic Query Translation to search Arabic text, and then expands the Arabic query to handle Arabic OCR-Degraded Text. This includes detection and translation of word collocations, translating single words, transliterating names, and disambiguating translation and transliteration through different approaches. It also expands the query with the expected OCR-Errors that are generated from the Arabic OCR-Errors simulation model which proposed inside the paper. The query translation and expansion model has been supported by different libraries proposed in the paper like a Word Collocations Dictionary, Single Words Dictionaries, a Modern Arabic corpus, and other tools. The model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion; it gives high degree of accuracy in handling OCR errors.</div>
</front>
</TEI>
<affiliations><list><country><li>Égypte</li>
</country>
</list>
<tree><country name="Égypte"><noRegion><name sortKey="Elghazaly, Tarek" sort="Elghazaly, Tarek" uniqKey="Elghazaly T" first="Tarek" last="Elghazaly">Tarek Elghazaly</name>
</noRegion>
<name sortKey="Elghazaly, Tarek" sort="Elghazaly, Tarek" uniqKey="Elghazaly T" first="Tarek" last="Elghazaly">Tarek Elghazaly</name>
<name sortKey="Fahmy, Aly" sort="Fahmy, Aly" uniqKey="Fahmy A" first="Aly" last="Fahmy">Aly Fahmy</name>
<name sortKey="Fahmy, Aly" sort="Fahmy, Aly" uniqKey="Fahmy A" first="Aly" last="Fahmy">Aly Fahmy</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000922 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000922 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:B57F71A720C3286257DF773DF9FD0D9AA1EB23F2 |texte= Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text }}
This area was generated with Dilib version V0.6.32. |